Using LSA to Automatically Identify Givenness and Newness of Noun Phrases in Written Discourse

نویسندگان

  • Christian F. Hempelmann
  • David Dufty
  • Philip M. McCarthy
  • Arthur C. Graesser
  • Zhiqiang Cai
  • Danielle S. McNamara
چکیده

Identifying given and new information within a text has long been addressed as a research issue. However, there has previously been no accurate computational method for assessing the degree to which constituents in a text contain given versus new information. This study develops a method for automatically categorizing noun phrases into one of three categories of givenness/newness, using the taxonomy of Prince (1981) as the gold standard. The central computational technique used is span (Hu et al., 2003), a derivative of latent semantic analysis (LSA). We analyzed noun phrases from two expository and two narrative texts. Predictors of newness included span as well as pronoun status, determiners, and word overlap with previous noun phrases. Logistic regression showed that span was superior to LSA in categorizing noun-phrases, producing an increase in accuracy from 74% to 80%.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Newness and Givenness of Information : Automated Identification in Written Discourse

The identification of new versus given information within a text has been frequently investigated by researchers of language and discourse. Despite theoretical advances, an accurate computational method for assessing the degree to which a text contains new versus given information has not previously been implemented. This study discusses a variety of computational new/given systems and analyzes...

متن کامل

Informational Status and Pitch Accent Distribution in Spontaneous Dialogues in English

Revealing the relations between pitch accent types and the informational status of words requires a refined discourse analysis of spontaneous speech. A cooperative unscripted task in which subjects gave instructions for decorating Christmas trees successfully induced production of target adjective-noun pairs conveying new/given and contrastive information. Adapting Grosz and Sidner’s intention-...

متن کامل

Corpus-Based Identification of Non-Anaphoric Noun Phrases

Coreference resolution involves finding antecedents for anaphoric discourse entities, such as definite noun phrases. But many definite noun phrases are not anaphoric because their meaning can be understood from general world knowledge (e.g., "the White House" or "the news media"). We have developed a corpus-based algorithm for automatically identifying definite noun phrases that are non-anaphor...

متن کامل

Corpus - Based Identi cation of Non - Anaphoric NounPhrasesDavid

Coreference resolution involves nding antecedents for anaphoric discourse entities, such as deenite noun phrases. But many deenite noun phrases are not anaphoric because their meaning can be understood from general world knowledge (e.g., \the White House" or \the news media"). We have developed a corpus-based algorithm for automatically identifying deenite noun phrases that are non-anaphoric, w...

متن کامل

Use of Articles in Learning English as a Foreign Language: A Study of Iranian English Undergraduates

The significance of error analysis for the learner, the teacher and the researcher is now widely recognized. Earlier studies of error analysis concentrated on intersystematic comparison of the “native language” and the “target language” and drew the required data largely from intuitions and impressionistic observations. This study was conducted on the basis of the following observations: (1) to...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005